Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate timing counts on-the-fly + support dynamic events #49901

Merged
merged 4 commits into from
Jun 2, 2023

Conversation

topolarity
Copy link
Member

@topolarity topolarity commented May 19, 2023

This paves the way for a Julia-side API that can create new events on-the-fly without having to modify timing.h

The core of the change is to split our profiling data into two different structs:

  • jl_timing_event_t: stores statically-determined attributes of a profiler event (typically, zone name and source location info)
  • jl_timing_block_t: stores the dynamic information relevant to a particular span/measurement in the timing run

Events and timing blocks have a one-to-many relationship, and both can be allocated dynamically. The intended pattern for a Julia-side API is to construct an event once at parse-time using "jl_timing_event_create" (since this is relatively expensive due to profiler traffic and allocations) and then to create its own timing block on-the-fly for each block entry/exit.

P.S. @pchintalapudi I'm planning to re-base this on top of #49679 as soon as that lands Done

@topolarity topolarity changed the title Allocate all timing events dynamically Allocate timing counts on-the-fly + support dynamic events May 19, 2023
src/timing.c Outdated Show resolved Hide resolved
src/timing.c Outdated
Comment on lines 160 to 203
for (size_t i = 0; i < n; i++) {
jl_timing_counts_event_t *other_event = (jl_timing_counts_event_t *)jl_timing_counts_events.items[i];
if (strcmp(event, other_event->name) == 0)
return other_event;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this be a hash table lookup or something faster than linear time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this event-creation overhead is going to matter much until we start creating many more events (we have less than 100 now)

Can that be tackled in a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can push this to a follow up.

src/timing.c Outdated
#endif
}

JL_DLLEXPORT jl_timing_event_t *jl_timing_event_create(const char *subsystem, const char *name, const char *function, const char *file, int line, int color) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should deduplicate at this point, rather than relying on each individual event system to handle deduplication. I think at this point we should only have a 1:1 mapping of event name to source location, so there isn't a reason to only deduplicate for counts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already de-duplicate at the source location level, but that's the caller's responsibility

For any de-duplication that goes beyond that, I think splitting to the back-ends makes sense since they all use slightly different information.

For counts specifically, being able to re-use event names in multiple locations seems like a nice feature, and the de-duplication gives us the flexibility to group counts by subsystem in the future if we want to, so I don't really want to rip out the counts event de-dupe either.

src/timing.h Outdated Show resolved Hide resolved
src/timing.c Outdated Show resolved Hide resolved
src/timing.c Outdated Show resolved Hide resolved
src/timing.h Outdated Show resolved Hide resolved
@topolarity topolarity force-pushed the julia-side-timing branch from e65346e to 15b9c94 Compare May 20, 2023 00:32
src/timing.c Show resolved Hide resolved
src/timing.c Outdated Show resolved Hide resolved
@topolarity topolarity force-pushed the julia-side-timing branch 3 times, most recently from cadbf29 to a1c7f04 Compare May 25, 2023 20:41
@topolarity topolarity force-pushed the julia-side-timing branch from a1c7f04 to 7d9ec06 Compare May 25, 2023 20:53
@topolarity
Copy link
Member Author

topolarity commented May 25, 2023

This does regress the timing counts back-end somewhat, since the events output is no longer sorted.

I'll take a look tomorrow and see if I can add sorting before we print. Done.

@KristofferC
Copy link
Member

I think the wrong zones are disabled with this change since there is now a mismatch between getting the sorted index when the zone is created vs getting the unsorted index when DISABLE_SUBSYSTEM is run.

Something like 903ce26 seems to fix it.

@topolarity topolarity force-pushed the julia-side-timing branch from 7d9ec06 to 58320fa Compare May 26, 2023 13:12
@topolarity
Copy link
Member Author

topolarity commented May 26, 2023

I think the wrong zones are disabled with this change since there is now a mismatch between getting the sorted index when the zone is created vs getting the unsorted index when DISABLE_SUBSYSTEM is run.

Looks like a rebasing casualty from yesterday - Thanks for the heads up!

Should be fixed now, and event sorting is also back in.

This paves the way for a Julia-side API that can create new events
on-the-fly without having to modify timing.h

The core of the change is to introduce two different structs:
  - An "event" stores all of the statically-determined attributes of a
    profiler event (typically, zone name and source location info)
  - A "timing block" stores the dynamic information relevant to a
    particular span/measurement in the timing run

Events and timing blocks have a one-to-many relationship. The intended
pattern for a Julia-side API is to construct an event once at parse-time
using `jl_timing_event_create` (since this is relatively expensive due
to profiler traffic and allocations) and then to create its own timing
block on-the-fly for each block entry/exit.

This also re-factors the API a bit to hopefully be more consistently
named
@topolarity topolarity force-pushed the julia-side-timing branch from 58320fa to 3c915f3 Compare June 1, 2023 20:25
@topolarity
Copy link
Member Author

VTune has been tested with a bit of help from @gbaraldi (thanks!) and is looking OK

Timing counts and Tracy are also looking good.

This change should be good to land once it gets review.

@topolarity topolarity added the merge me PR is reviewed. Merge when all tests are passing label Jun 1, 2023
@topolarity topolarity merged commit a35e53d into JuliaLang:master Jun 2, 2023
@topolarity topolarity deleted the julia-side-timing branch June 2, 2023 03:25
@topolarity topolarity removed the merge me PR is reviewed. Merge when all tests are passing label Jun 2, 2023
vtjnash referenced this pull request Jun 12, 2023
* fixup for 49996 and add test from 50065

---------

Co-authored-by: Lilith Hafner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants